Indonesian Journal of Electrical Engineering and Computer Science 
Vol. 29, No. 1, January 2023, pp. 421~430 
ISSN: 2502-4752, DOI: 10.1159 1/ijeecs.v29.i1 .pp421-430 0 421 


An adaptive algorithm based on principal component analysis- 
deep learning for anomalous events detection 


Zainab K. Abbas, Ayad A. Al-Ani 


Department of Information and Communication Engineering, College of Information Engineering, Al-Nahrain University, 


Baghdad, Iraq 


Article Info 


ABSTRACT 


Article history: 


Received Aug 13, 2022 
Revised Sep 9, 2022 
Accepted Sep 19, 2022 


Keywords: 


Anomaly detection 
Bidirectional long short term 
memory 

Deep learning 

Machine learning 


One of the most often used applications of human activity detection is 
anomaly detection, which is covered in this paper. Providing security for a 
person is a key issue in every community nowadays because of the constantly 
expanding activities that pose danger, from planned violence to harm caused 
by an accident. Existing classical closed-circuit television considered is 
insufficient since it needs a person to stay awake and constantly monitor the 
cameras, which is expensive. In addition, a person's attention decreases after 
a certain time. For these reasons, the development of an automated security 
system that can identify suspicious activities in real-time and quickly aid 
victims is required. Because identifying activity must be with high accuracy, 
and in the shortest possible time. We adopt an adaptive algorithm based on 
the combination of machine learning (ML), principal component analysis 
(PCA) and deep learning (DL). The UCF-crime dataset was used for the 
experimentation in this work. Where the area under the curve (AUC) with the 


proposed approach was equal to 94.21% while the detection accuracy was 
equal to 88.46% on the test set database. The suggested system has 
demonstrated its robustness and accomplishment of the best accuracy when 
compared with earlier designed systems. 
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1. INTRODUCTION 

Video surveillance systems (VSS) are frequently utilized to increase public safety in various settings, 
including malls, hospitals, banks, markets, intelligent cities, educational institutions, and roadways [1]. The 
main goal of security applications is typically the accuracy and speed of video anomaly identification [2]. For 
the sake of public safety, numerous surveillance cameras have recently been put in numerous areas throughout 
the world [3]. These cameras continuously generate massive volumes of video data [3]. Therefore, real-time 
video analysis and finding abnormal cases require a lot of human resources. In addition, it is subject to fault 
due to human attention loss over time [1]. Because human monitoring is ineffective, automatic anomaly 
detection solutions based on artificial intelligence (AI) algorithms become necessary in surveillance systems [4]. 

Various techniques in the literature identify anomaly actions as "the occurrence of variance in regular 
patterns" [3]. Traffic security, automated intelligent visual monitoring, and crime prevention are some 
applications for abnormal event detection in surveillance videos [5]. Due to the lack of actual anomalous 
instances, video anomalous detection was formerly assumed to be a one-class classification task [6]-[8] that is 
to mean the classifier model is trained on normal movies, and a video is classified as anomalous when irregular 
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patterns are seen in the testing [5]. As a result, various normal behaviors may swerve from normal events in 
the training set, resulting in false alarms [1], [9]. 

Different studies were performed in this field, and all these strategies were adopted for a specific 
situation. A study performed by Wagas et al. [10] provided a framework that can recognize abnormal attitudes 
and tell the user of the type of behavior. Another study suggests adaptively compressing each video before to 
being transmitted to the event detection system by Shreyas et al. [11]. Anala et al. [12], Hao et al. [13] and 
Dubey et al. [14] regarded the detect anomalous behavior as a regression problem. Another study offered a 
lightweight convolutional neural network (CNN) by Ullah er al. [5]. Ullah et al. [3] proposed an intelligent 
anomaly detection system that depends on combining the ResNet50 with multilayer bidirectional long short 
term memory (BiLSTM). Zaheer et al. [15] presented a weakly supervised anomaly detection method that 
trains the model using video-level labels. Another method for handling anomaly detection and classification 
using a weakly supervised learning model was provided by Majhi et al. [16]. Employing multi-detail ideas in 
both the temporal and spatial dimensions as input, a dual branch network has been developed by Wu et al. [17]. 
To detect video anomalies, Cao et al. [18] suggested taking into consideration the spatial-temporal relationships 
between video parts. Abbas and Al-Ani [2] suggest compressing each video using high-efficiency video coding 
(H265) before feeding the video into the anomaly detection systems. An algorithm for reducing the size of the 
extracted features has been suggested by Abbas and Al-Ani before anomaly identification [19]. 

Even though the word "anomaly" is used frequently in literature, there isn't a standard definition of it 
yet [9]. The most of existing technologies have a high rate of false alarms. Additionally, the effectiveness of 
these approaches is reduced when used in real-world situations, even though they work effectively on 
basic databases. 

To overcome these difficulties, we suggested reducing the dimensionality of the data features 
extracted using pre-trained convolution neural networks (Resnet50), by using principal component analysis 
(PCA) [20] to improve the performance of the model and reduce the model complexity. After that, we feed the 
features to our classifier model, which is a BILSTM. We employ a weakly supervised method based on spatio- 
temporal features and BiLSTM to train our classifier model. When the context of the input is needed, BiLSTMs 
have proven to be highly helpful. While in a unidirectional LSTM, information moves from backward to 
forward, the BiLSTM uses two hidden states to flow information not only backward to forward, but also 
forward to backward. As a result, BiLSTMs are better able to perceive the context [21]. 


2. METHODS 

The results of a study that compared the various methods used for anomaly detection showed that 
deep learning (DL) has outperformed other methods in this field [1]. The recommended approach used in this 
work is split into three stages: 
— Feature extraction using Resnet50. 
—  Dimensionality reduction using PCA. 
— Anomaly events detection using BiLSTM. 

This work focuses on the assessment of the mixture of the machine learning (ML) and DL algorithms 
for anomaly event detection purposes in videos for the first time on the UCF-Crime database. Figure 1 
illustrates the suggested framework. 
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Figure 1. 


The suggested framework 
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2.1. Input database UCF-crime 

The UCF-Crime database is the database that was used in this work [22], [23]. This comprises 13 different 
categories of anomalies, such as explosions, fights, abuse, and accidents, in addition to normal events. The collection 
contains 1900 surveillance videos, with almost equal numbers of normal and abnormal videos. The training set 
included 810 anomalous and 800 normal videos, while the testing set included 140 anomalous and 150 normal videos 
[10], [24]. This set includes almost 129 hours of videos with a resolution of 320x240 and 13 million frames, with 
different videos length [10], [22], [24]. We selected this database because it includes a variety of anomalous event 
categories since the irregularities in it have a big influence on public safety. However, there are two issues with this 
database. The first is that this dataset's anomalous class has high inter-class variations. The second, video-level labels, 
which mean we only know that each movie has an abnormality but not which exact part is abnormal. This maybe 
leads to overfitting [2], [22]. In this work, we picked the video with a length less than or equal to 2 min, depending 
on this condition, we had 1324 videos in total divided as follows: 1116 videos for the training purpose (90% for 
training and 10% for validation), and the remainder, which was 208, was used for the testing purpose. 


2.2. Machine learning (ML) and deep learning (DL) 

It is possible to successfully handle unstructured data with DL which is considered a subset of ML. 
DL approaches exceed current ML approaches. It makes it possible for computational models to gradually 
learn and comprehend the features of the information at various levels. DL became more and more prevalent 
as data availability expanded and powerful computers. The DL method transforms the input into levels, each 
of which can extract features and send them to the next layer. Initial layers gather basic data, which is then 
integrated with later layers to provide a complete description. DL can be accomplished using a variety of 
designs, including CNN, pre-trained networks, recurrent neural networks (RNN), and others. The efficiency of 
DL classifiers greatly increases as the number of data increases when compared to traditional learning methods. 
DL algorithms perform better as the amount of training data increases, but traditional machine learning 
algorithms’ performance stabilizes after a certain amount of training data. Although deep structures require 
more time to train, they perform better than straightforward artificial neural networks (ANNs). However, 
strategies like transfer learning and GPU computing can shorten the training period [1], [2], [25]. 

One of the neural network kinds is CNN which contain convolutional layers. Despite the processing 
of the spatial data using CNN being good, for handling the sequential data RNN is better. This is due to RNN 
utilizing state variables to save the past data and use them beside the present inputs to determine the present 
outputs [2], [26]. Most of the time, CNN is used in image processing. Various items in the image are given 
biases and weights, and this separates them. CNN needs lower preparation than other classification techniques. 
Where it uses the appropriate filters to extract both temporal and spatial links in an image. Some CNN kinds 
are ZFNet, VGGNet, ResNet, GoogleNet, AlexNet and LeNet [2], [25]. 

RNNs, on the other hand, use prior outputs as inputs to determine the condition of the current situation. 
Knowledge can be remembered by RNN's hidden layers. To modify the hidden state, the output generated in 
the previous state was applied. Due to their ability to remember previous inputs, RNNs can be utilized to 
estimate time series. An example of an RNN is the LSTM [2], [25]. 


2.2.1. Features extraction 

The 50-layer deep convolutional neural network ResNet50, which has 23.5 million learnable 
parameters, was utilized in this study to extract features [3]. This network generates 1000 characteristics for 
each frame [2]. ResNet50's input is 224 by 224 pixels in size. For this reason, in this research, the longest edges 
of a movie were cropped and resized to match the input dimensions using a center crop. 


2.2.2. Dimensionality reduction - principal component analysis (PCA) 

This technique can be found in the literature with different names like the singular value 
decomposition (SVD), hotelling transform, the Karhunen Loeve transform (KLT), and the empirical orthogonal 
function (EOF) method [27]. It is an old, straightforward statistical approach that changes data from a higher 
dimension to a lower dimension in the hopes of gaining a better understanding of the data. For data 
compression, dimensionality reduction, and data visualization, PCA is utilized. In the case of dimensionality 
reduction, it greatly simplifies the problem by drastically lowering the number of features. If the original dataset 
has x features, the new dataset will have y features, where y is less than x. Some information will be lost due 
to the new dimension being smaller than the original one. PCA can aid in the discovery of factors buried deep 
within the data. The principal components are uncorrelated, linear combinations of the original data. 90% of 
the original signal could be accounted for by two or three of the main components. In machine learning, PCA 
is an unsupervised learning approach for dimensionality reduction. It is a statistical technique that converts a 
set of linearly uncorrelated features from correlated feature observations via orthogonal transformation. The 
principal components (PC) indicate these newly modified features [20], [27], [28]. Figure 2 shows the steps of 
PCA applied in this work [27]. 
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Figure 2. The PCA steps 


2.2.3. Classifier 

Unlike previous studies that fed the classifier model with features extracted from the features 
extraction stage, in this work, we fed the classifier model with features got from the dimensional reduction 
stage. The BiLSTM consider suitable for long sequential data such as videos [2]. So, in this research, we used 
the BiLSTM as a classifier model for anomaly detection. 


3. RESULTS AND DISCUSSION 

The MATLAB software environment (version 2021a) was utilized to implement the computer codes 
in this research. The work was done on the computer with the following specification: Windows 10, Intel Core 
i7 processor, | TB SSD hard drive, 64-bit operating system, 16 GB RAM, and NVIDIA GeForce MX450 
graphics processing unit. The results of each stage were as follows: 


3.1. Input dataset UCF-crime 

The UCF-Crime database was used for this study's experiments. The 13 anomaly classes in addition 
to the normal class were used. Since the anomaly identification in this work is done at the video level, the 
duration of the video had no bearing on the functioning of anomaly detection, so we chose videos that were no 
longer than or equal to two minutes, depending on this condition, we had 1324 videos in total divided as 
follows: 1116 videos for the training purpose (90% for training and 10% for validation), and the remainder, 
which was 208, was used for the testing purpose. 


3.2. Machine learning (ML) and deep learning (DL) 
3.2.1. Features extraction 

In this study, feature extraction was accomplished using a pre-trained model called ResNet50. Instead 
of the SoftMax layer, the fc1000 layer is used to extract the features. This means each frame in the video after 
this step will be represented by 1000 features, i.e. for a video with (x) number of frames the number of features 
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will be (1000 * x) and this is considered a huge data, for this reason, in this paper, we suggest for the first time 
feed the extracted features to PCA model before classifying them, to reduce the training time in the same time 


increase the accuracy of the classification. 


3.2.2. Dimensionality reduction - principal component analysis (PCA) 

In this stage, PC is calculated for the extracted features where PC refers to these newly modified features 
which will be fed the classifier model instead of the extracted features from ResNet50. In this work, the PC for 
different variance values has been calculated for each video in the database. Figures 3 and 4 show the PC numbers 
for all videos at different variance values for train and test data, respectively, for the UCF-Crime dataset. Where 
the x-axis represents the video number, the y-axis represents the PC number of the video. 
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Figure 3. The video’s PC number for train datasets at different variance values 
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Figure 4. The video’s PC number for test datasets at different variance values 
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3.2.3. Classifier 

Application of decision classifiers becomes necessary after the features are complete. In this work, 
the BiLSTM classification model has been used. The parameters of the classifier model are shown in Table 1 
for different variance values, Adam optimizer was utilized for all cases. The value of L2Regularization and 
dropout was selected to lessen the model's overfitting. The model parameters were selected by trial and error. 


Table 1. The parameter values of the classifier model at different variance values 
Classifier Variance value 
Parameters 99% 95% 90% 85% 80% 75% 10% 65% 60% 55% 
Minimum Batch Size 32 32 32 32 32 32 32 32 32 32 
Hidden layer nodes No. 112 112 112 112 112 112 112 112 112 112 


Dropout 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 
Initial Learning Rate le-S le-5 le-5S le-S tle-5 Ile5 tle-S tleS Ile5  le-5 
Maximum epochs 170 170 190 170 ©1700 6170) 6170617006170 170 
L2Regularization 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 


Following the previous research, the area under curve (AUC) and the receiver operating characteristics 
(ROC) were used as performance measures to assess the performance of the suggested work. Moreover, 
we determine the detection accuracy of our classifier model. The classifier accuracy was determined using 
the (1) [29]. 


number of correct predictions 
accuracy = ee ee ee (1) 


total number of predictions 

The experimental results show the efficiency of our proposed framework, as it detects anomalous 
events with greater precision than existing methods. Our classifier model's ROC curve is shown on the left side 
in Figures 5-14, while the confusion matrix is shown on the right, for different variance values. It is clear that 
the highest value for AUC was recorded at a variance value equal to 90%, while the highest value for detection 
accuracy was recorded at a variance value equal to 95% where the true positive (TP) which means an anomaly 
classified as anomaly was equal to 84, false negative (FN) which means an anomaly classified as normal was 
equal to 9, true negative (TN) which means normal classified as normal was equal to 100, and false positive 
(FP) which means normal classified as an anomaly was equal to 15. 
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Figure 5. The classifier model ROC curve and the confusion matrix for variance = 99% 
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Figure 6. The classifier model ROC curve and the confusion matrix for variance = 95% 
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Figure 7. The classifier model ROC curve and the confusion matrix for variance = 90% 
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Figure 9. The classifier model ROC curve and the confusion matrix for variance = 80% 
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Figure 10. The classifier model ROC curve and the confusion matrix for variance = 75% 
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Figure 11. The classifier model ROC curve and the confusion matrix for variance = 70% 
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Figure 12. The classifier model ROC curve and the confusion matrix for variance = 65% 
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Figure 13. The classifier model ROC curve and the confusion matrix for variance = 60% 
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Figure 14. The classifier model ROC curve and the confusion matrix for variance = 55% 
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After explaining the suggested system details, it is vital to compare the suggested system with the 
previous studies. The AUC scores have been compared with the previous research works in Table 2, and it is 
clear that the proposed system achieved the highest AUC of 94.21%. 


Table 2. AUC score comparison between the proposed work and the previous works 


Method AUC % 

Waagas et al. [10] 75.41 
Anala et al. [12] 85 

Shreyas et al. [11] 79.8 
Hao et al. [13] 81.22 
Dubey et al. [14] 81.91 
Ullah et al. [5] 78.43 
Ullah et al. [3] 85.53 
Zaheer et al. [15] 78.27 
Majhi et al. [16] 82.12 
Wu et al.[17] 87.65 
Cao et al. [18] 83.14 
Abbas and Al-Ani [2] 90.16 
Abbas and Al-Ani [19] 93.61 


Our Adaptive Algorithm 94.21 


6. CONCLUSION 


The proposed anomalous event detection system based on a combination between ML and DL is 
designed and tested, in which the feature vector for each video was extracted using pre-trained Resnet50. After 
that, the feature reduction algorithm PCA was used to remove particular features, which was used for the first 
time in such work. Finally, the new feature vectors were fed into BiLSTM for abnormal and normal class 
detection. In comparison with previous works on anomalous detection approaches, the suggested system has 
been shown to have superior accuracy. According to the experimental results, the AUC value for the UCF- 
Crime database increased by up to 94.21%. We also measured the classifier's detection accuracy, which came 
out to be 88.46%. And this demonstrates the effectiveness of our suggestion to increase the accuracy of 
anomaly event detection, with the least amount of both negative and positive false alarms. Future work will 
focus on investigating different feature extraction models, feature selection techniques, and dimensionality 
reduction techniques to combine them with our proposed system to improve the accuracy indicator. 
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